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Abstract 

We introduce a kernel method for manifold alignment (KEMA) and domain 
adaptation that can match an arbitrary number of data sources without needing 
corresponding pairs, just few labeled examples in all domains. KEMA has interest¬ 
ing properties: 1) it generalizes other manifold alignment methods, 2) it can align 
manifolds of very different complexities, performing a sort of manifold unfolding 
plus alignment, 3) it can define a domain-specific metric to cope with multimodal 
specificities, 4) it can align data spaces of different dimensionality, 5) it is robust 
to strong nonlinear feature deformations, and 6) it is closed-form invertible which 
allows transfer across-domains and data synthesis. To authors’ knowledge this is 
the first method in addressing all these important issues at once. We also present 
a reduced-rank version for computational efficiency and discuss the generalization 
performance of KEMA under Rademacher principles of stability. KEMA exhibits 
very good performance over competing methods in synthetic examples, visual ob¬ 
ject recognition and recognition of facial expressions tasks. 


1 Introduction 

Domain adaptation constitutes a field of high interest in pattern analysis and machine 
learning. Classification algorithms developed with data from one domain cannot be 
directly used in another related domain, and hence adaptation of either the classifier or 
the data representation become strictly imperative [1]. In this paper, we focus on the 
latter pathway, which has been referred to as feature representation transfer [2], fea¬ 
ture transformation learning [3] or manifold alignment [4]. Roughly speaking, align¬ 
ing data manifolds reduces to finding projections to a common latent space where all 
datasets show similar statistical characteristics. Depending on the availability of labels 
in the different domains, three families of adaptation problems have been considered 
in the literature. 

Unsupervised adaptation: First attempts of unsupervised domain adaptation are 
found in multiview analysis [5], and more precisely in canonical correlation analysis 
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(CCA) and kernel CCA (KCCA) [6]. Despite their good performance in general, they 
still require points in different sources to be corresponding pairs, which is often hard 
to meet in real applications. Alternative methods seek for a set of projectors that min¬ 
imize a measure of discrepancy between the source and target data distributions, such 
as the Maximum Mean Discrepancy (MMD) [7] or the recent geodesic distance be¬ 
tween distributions [8]. However, to compare distributions, the data are supposed to 
be represented by the same features in all domains. The idea of exploiting geodesic 
distances along manifolds was also considered in [9], where a finite set of intermediate 
transformed data distributions are sampled along the geodesic flow (SGF) between the 
linear subspaces. The intermediate features are then used to train the classifier. The 
idea was extended in [10], where a Geodesic Flow Kernel (GFK) was constructed by 
considering the infinity of transformed subspaces along the geodesic path. However, 
both SGF and GFK assume input data space of the same dimensionality. 

Semi-supervised adaptation with labels in the source domain only: Some of the 
abovementioned methods can incorporate the information of labeled samples in the 
source domain; For example, SGF [9] and GFK [10] become semi-supervised if the 
eigenvectors of the source domain are found with a discriminative feature extractor 
such as partial least squares (PLS). Another family of methods, collectively known 
as Optimal Transport (OT) techniques, uses labeled samples in the source domain to 
maximize coherence in the transportation plan of masses between source and target 
domains [11]. 

Supervised adaptation with labels in all domains: SGF and GFK can be also de¬ 
fined for the case in which all the domains are labeled. Alternative approaches try to 
align target and source features while simultaneously moving labeled examples to the 
correct side of the decision hyperplane (MMDT) [12]. A last family of supervised 
methods is known as manifold alignment, and aims at concurrently matching the cor¬ 
responding instances while preserving the topology of each input domain, generally 
using a graph Laplacian [4, 13]. While appealing, these methods still require spec¬ 
ifying a small amount of cross-domain sample correspondences. The problem was 
addressed in [14] by relaxing the constraint of paired correspondences with the con¬ 
straint of having the same class labels in all domains. The semi-supervised manifold 
alignment (SSMA) method proposed in [14] projects data from different domains to a 
latent space where samples belonging to the same class become closer, those of differ¬ 
ent classes are pushed far apart, and the geometry of each domain is preserved. The 
method performs well in general and can deal with multiple domains of different di¬ 
mensionality. However, SSMA cannot cope with strong nonlinear deformations and 
high-dimensional data problems. 

This paper introduces a generalization of SSMA through kernelization. The pro¬ 
posed Kernel Manifold Alignment (KEMA) has appealing properties: (1) it reduces 
to SSMA when using a linear kernel, which allows us to deal with high-dimensional 
data efficiently in the dual form (Q-mode analysis): by this property, KEMA can cope 
with input space of very large dimension, e.g. those extracted by Eisher vectors or deep 
features; (2) it goes beyond data rotations so it can align manifolds of very different 
structures, performing a sort of manifold unfolding simultaneous to the alignment; (3) 
it can also define a domain-specific metric by the use of different kernel functions in the 
different domains; (4) as SSMA, KEMA can align data spaces of different dimension- 
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ality; (5) it is robust to strong (nonlinear) deformations of the manifolds to be aligned, 
as the kernel compensates for problems in graph estimation and numerical problems; 
and (6) mapping inversion (and hence data synthesis) can be performed in closed-form 
without the need of pre-images, which permits measuring the quality of the alignment 
in meaningful physical units. 

The remainder of the paper is organized as follows. Section 2 briefly reviews the 
main properties of the SSMA algorithm. Section 3 introduces the KEMA formulation 
and analyzes its theoretical and practical properties. Section 4 presents the experimen¬ 
tal evaluation of the algorithm. We compare KEMA to SSMA and related (linear and 
kernel) methods in toy examples and real visual object and face recognition problems. 
We conclude with some remarks in Section 5. 


2 Semi-supervised Manifold Alignment 

Let us consider D domains A) representing similar classification problems. The cor¬ 
responding data matrices, e z = 1,..., £), contain rii examples (labeled, 

li, and unlabeled, Ui, with rii = li + Ui) of dimension di, and n = The 

SSMA method [14] maps all the data to a latent space T such that samples belonging 
to the same class become closer, those of different classes are pushed far apart, and 
the geometry of the data manifolds is preserved. Therefore, three entities have to be 
considered, leading to three n x n matrices: 1) a similarity matrix that has com¬ 
ponents = 1 if Xi and Xj belong to the same class, and 0 otherwise (including 
unlabeled); 2) a dissimilarity matrix W^;, which has entries = 1 if x^ and Xj 
belong to different classes, and 0 otherwise (including unlabeled); and 3) a similarity 
matrix that represents the topology of a given domain, W, e.g. a radial basis function 
(RBE) kernel or a A: nearest neighbors graph computed for each domain separatedly 
and joined in a block-diagonal matrix. The three different entities lead to three differ¬ 
ent graph Laplacians: L^, L^z, and L, respectively. Then, the SSMA embedding must 
minimize a joint cost function essentially given by the eigenvectors corresponding to 
the smallest non-zero eigenvalues of the following generalized eigenvalue problem: 

Z(L -f = AZLdZ^V, 

where Z is a block diagonal matrix containing the data matrices and V contains 
in the columns the eigenvectors organized in rows for the particular domain, V = 
[vi, V 2 ,..., V£)]^. The method allows to extract a maximum of Nf = di fea¬ 

tures that serve for projecting the data to the common latent domain as follows: 

P^(x,) = v7x,. 

Advantageously, SSMA can easily project data between domains j and z: first 
mapping the data in Aj to the latent domain T, and from there inverting back to the 
target domain A) as follows: 

P,(X,) = (v,v|)Tx„ 

where ^ represents the pseudo-inverse of the eigenvectors of the target domain. There¬ 
fore, the method can be used for domain adaptation but also for data synthesis. 
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3 Kernel manifold alignment 


In order to kernelize the previous method one needs to first map the data to a Hilbert 
space, apply the representer’s theorem and replace the dot products therein with re¬ 
producing kernel functions. Let us first map the D different datasets to D possibly 
different Hilbert spaces T-Li of dimension Hi, 4>i{-) : x i—> G Hi, i = 1,..., D. 

Now, by replacing all the samples with their mapped feature vectors, the problem be¬ 
comes: 

$(L -h 

where $ is a block diagonal matrix containing the data matrices ^i = [(/)j(xi),..., 0j(x„J]^ 
and U contains the eigenvectors organized in rows for the particular domain defined 
in Hilbert space Hi, U = [ui, U 2 ,..., where H = Hi. This operation is 
possible thanks to the use of the direct sum of Hilbert spaces, a well-known property of 
Functional Analysis Theory [15]. Note that the eigenvectors are of possibly infinite 
dimension and cannot be explicitly computed. Instead, we resort to the definition of D 
corresponding Riesz representation theorems [16] so the eigenvectors can be expressed 
as a linear combination of mapped samples [17], = ^iOLi, and in matrix notation 

U = $A. This leads to the problem: 

$(L-h/xLs)^^$A = (1) 

Now, by premultiplying both sides by and replacing the dot products with the 
corresponding kernel matrices, we obtain the final solution: 

K(L pL5)KA = AKLdKA, 

where K is a block diagonal matrix containing the kernel matrices K^. Now the eigen- 
problem becomes of size n x n instead of d x d, and we can extract a maximum of 
Nf = n features. 

Kernel generalization: When a linear kernel is used for all the domains, = 

X.J Xi, KEMA reduces to SSMA: 

F^(X,) = aJXjX, = (X,a,)TX, = vJX,. 


This dual formulation is advantageous when dealing with very high dimensional datasets, 
di ^ Hi for which the SSMA problem is not well-conditioned. Operating in Q-mode 
endorses the method with numerical stability and computational efficiency in current 
high-dimensional problems, e.g. when using Fisher vectors or deep features. 

Projections to kernel latent space: Projection to the latent space requires first map¬ 
ping the data X^ to its corresponding Hilbert space Hi, thus leading to the mapped data 
and then applying the projection vector defined therein: 

Pr{X,) = uJ^, = aJ^J^, = cxjK,. (2) 
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Invertibility has a closed-form solution: In order to map data from Xj to Xi with 
KEMA we would need to estimate D — 1 inverse mappings, which would make KEMA 
unstable and useless to measure accuracy in meaningful physical units of the input 
space. In general, using kernel functions hampers the invertibility of the transformation 
unless pre-imaging is used, for which some efficient yet inexact solutions exist [18,19]. 
Here we propose a simple closed-form solution to the mapping inversion: to use a linear 
kernel for the latent-to-target transformation X^, and Kj for j ^ i with any 

desired form. Then, projection of data X^ to the target domain i becomes: 

P,(X,) = (ul)^aj K, = (a,(X,a,)t)TK„ (3) 

where for the target domain we used = XiCti. We should note that the 

solution is not unique since D different inverse solutions can be obtained depending on 
the selected target domain. 

3.1 Reduced rank approximation 

KEMA complexity scales quadratically with n in terms of memory, and cubically with 
respect to the computation time. Eeature extraction for new data requires the evaluation 
of n kernel functions per pattern, becoming computationally expensive for large n. To 
alleviate this problem, we propose a reduced-rank approximation of the span. The so- 
called Reduced-Rank Kernel Manifold Alignment (REKEMA) formulation imposes 
reduced-rank solutions for the projection vectors, W = where is a subset 

of the training data containing r samples (r <C n) and A is the new argument for the 
maximization problem. Plugging W into Eq. (1), and replacing the dot products with 
the corresponding kernels, we obtain the final solution: 

Krn(L + /rLs)K nr^ — A, 

where JCm is a block diagonal matrix containing the kernel matrices comparing 
a reduced set of r representative vectors and all training data points, n. REKEMA 
reports clear benefits for obtaining the projection vectors (eigenproblem becomes of 
size r X r instead of n x n), compacting the solution (now Nf = r features), and 
in storage requirements (quadratic with r). 

3.2 Stability of KEMA 

The use of KEMA in practice raises the question of the amount of data needed to 
provide an accurate empirical estimate and how the quality of the solution differs de¬ 
pending on the datasets. Such results have been previously derived for KPCA [20] 
and KPLS [21] and here we adapt them to our setting. The following properties are 
based on the concentration of sums of eigenvalues of the generalized KEMA eigen¬ 
problem solved using a finite number of samples, where new points are projected into 
the TO-dimensional space spanned by the m eigenvectors corresponding to the largest m 
eigenvalues. Eollowing the notation in [20], we refer to the projection onto a subspace 
U of the eigenvectors of our eigenproblem as Pu ((/)(x)). We represent the projection 
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onto the orthogonal complement of U by Pu± ((/)(x)). The norm of the orthogonal pro¬ 
jection is also referred to as the residual since it corresponds to the distance between 
the points and their projections. 


Theorem 1 (Th. 1 and 2 in [20]) If we perform KEMA in the feature space defined 
by K* = (K(L + pL^)K)-iKLdK, then with probability greater than 1 — <5 over 
n random samples S, for all 1 < m < n, if we project data on the space Um, the 
expected squared residual is bounded by 
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where the support of the distribution is in a ball of radius R in the feature space and 
Xi are Xi are the process and empirical eigenvalues, respectively. 


The lower bound confirms that a good representation of the data can be achieved by us¬ 
ing the first m eigenvectors if the empirical eigenvalues quickly decrease before s/TJn 
becomes large, while the upper bound suggests that a good approximation is achiev¬ 
able for values of m where ^JrnJn is small. These results can be used as a benchmark 
to test different approaches or to select among possible candidate kernels. Also, note 
that depending on how much non-diagonal is K* (i.e. how large are the manifold 
mis-alignments), the KEMA bounds may be tighter than those of KPCA. With an ap¬ 
propriate estimation of the manifold structures via the graph Laplacians and tuning of 
the kernel parameters, the performance of KEMA will be at least as fitted as that of 
KPCA. 


4 Experimental results 

We analyze the behavior of KEMA in a series of artificial datasets of controlled level of 
distortion and mis-alignment, and on real domain adaptation problems of visual object 
recognition from multi-source commercial databases, and recognition of multi-subject 
facial expressions. 

4.1 Toy examples with controlled distortions and manifold mis¬ 
alignments 

Setup: the first battery of experiments contains a series of toy examples composed 
of two domains with data matrices Xi and X 2 , which are spirals with three classes 
(see the two first columns of Eig. 1). Then, a series of deformations are applied to the 
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second domain: scaling, rotation, inversion of the order of the classes, the shape of 
the domain (spiral or line) or the data dimensionality. For each experiment, 20 labeled 
pixels per class were sampled in each domain, as well as 1000 unlabeled samples that 
were randomly selected. Classification performance was assessed on 1000 held-out 
samples from each domain. 

Latent space and domain adaptation: Figure 1 illustrates the projections obtained 
by KEMA when using a linear and an RBF kernel (lengthscale was set as the average 
distance between labeled samples) and the classification errors for the samples from the 
source domain (Ith column) and the target (8f/i column). The linear KEMA (SSMA) 
can align effectively the domains in experiments #1 and #4, which are basically scal¬ 
ings and rotations of the data. However, it fails on experiments #2 and #3, where the 
manifolds have undergone stronger deformations. The use of a nonlinear kernel al¬ 
lows much more flexible solution, performing a sort of unfolding plus alignment in all 
experiments. In experiment #1, even if the alignment is correct, the linear classifier 
trained on the projections of KEMAiin and SSMA cannot resolve the classification of 
the two domains, while KEMArbf solution provides a latent space where both domains 
can be classified correctly. Experiment #2 shows a different picture: the baseline error 
(green line) is much smaller in the source domain, since the dataset in 3D is linearly 
separable. Even if the classification of this first domain (•) is correct for all methods, 
classification after SSMA/KEMAiin projection of the second domain (•) is poor, since 
their projection in the latent space does not unfold the blue spiral. KEMArbf provides 
the best result. For experiment #3, the same trend as in experiment #2 is observed. 
Finally, experiment #4 shows a very accurate baseline (both domains are linearly sep¬ 
arable in the input spaces) and all methods provide accurate classification accuracies. 
Again, KEMArbf provides the best match between the domains in the latent space. 

Alignment with REKEMA: We now consider the reduced-rank approximation of 
KEMA proposed in Section 3.1. We used the data in the experiment #1 above. Figure 2 
illustrates the solutions of the standard SSMA (or KEMA with linear kernel), and for 
REKEMA using a varying rate of samples. We also give the classification accuracies of 
a SVM (with both a linear and an RBE kernel) in the projected latent space. Samples 
were randomly chosen and the sigma parameter for the RBE kernel in KEMA was 
fixed to the average distance between all used labeled samples. We can observe that 
SSMA successfully aligns the two domains, but we still need to resort to nonlinear 
classification to achieve good results. REKEMA, on the contrary, essentially does 
two operations simultaneously: alignment and data unfolding. Excessive sparsification 
leads to poor results. Virtually no difference between the full and the reduced-rank 
solutions are obtained for small values of r: just 10% of examples are actually needed 
to saturate accuracies. 

Invertibility of the projections: Eigure 3 shows the results of invertibility of SSMA 
and KEMA (using Eq. (3)) on the previous toy examples. We use a linear kernel for 
the inversion part (latent-to-source) and use for the direct part (target-to-latent space) 
either a linear or an RBE kernel. All results are shown in the source domain space. All 
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Figure 1: Illustration of linear and kernel manifold alignment on the toy experiments. 
Left to right; data in the original domains (XI = •, X2 = •) and per class (•, • and 
•), data projected with the linear and the RBF kernels, and error rates as a function of 
the extracted features when predicting data for the first (left inset) or the second (right 
inset) domain (KEMAun, KEMArbf, SSMA, Baseline). 



Figure 2: Linear and kernel manifold alignment on the scaled interwined spirals toy 
experiment (Exp. #1 in Eig. 1). REKEMA is compared to SSMA for different rates of 
training samples (we used li = 100 and Ui = 50 per class for both domains). 


the other settings (# labeled and unlabeled, p, graphs) are kept as in the experiments 
shown in Eig. 1. The reconstruction error, averaged on 10 runs, is also reported: KEMA 
is capable of inverting the projections and is always as accurate as the SSMA method 
in the simplest cases (#1, #4). Eor the cases related to higher levels of deformation, 
KEMA is either as accurate as SSMA (#3, where the inversion is basically a projection 
on a line) or significantly better: e.g. for experiment #2, where the two domain are 
strongly deformed, only KEMA with RBE kernel can achieve satisfying inversion, as 
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it unfolds the target domain and then only needs a rotation to match the distribution in 
the source domain. 
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Figure 3: Domain inversion with SSMA and KEMA. (•) = samples in the source do¬ 
main, (•) = target domain samples projected onto the source domain, and (•, •, •) = 
class distributions. Each plot shows the result of a single run, and the averaged f 2 -norm 
reconstruction error over 10 runs. 


4.2 Visual object recognition in multi-modal datasets 

We here evaluate KEMA on visual object recognition tasks by using the dataset intro¬ 
duced in [22]. We consider the four domains Webcam (W), Caltech (C), Amazon (A) 
and DSLR (D), and selected the 10 common classes in the four datasets following [10]. 
By doing so, the domains contain 295 (Webcam), 1123 (Caltech), 958 (Amazon) and 
157 (DSLR) images, respectively. The features were extracted as described in [22]: we 
use a 800-dimensional normalized histogram of visual words obtained from a code¬ 
book constructed from a subset of the Amazon dataset on points of interest detected 
by the Speeded Up Robust Eeatures (SURE) method. We used the same experimen¬ 
tal setting as [9,10], in order to compare with these unsupervised domain adaptation 
methods. Additionally, we compare our proposal with the following semi-supervised 
domain adaptation methods: SGE [9], GEK [10], OT-lab [11] and MMDT [12]. 

Eor all methods, we used 20 labeled pixels per class in the source domain for the C, 
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A and D domains and 8 samples per class for the W domain. After alignment, an ordi¬ 
nary 1-NN classifier was trained with the labeled samples. The same labeled samples 
in the source domain were used to define the PLS eigenvectors for GFK and OT-lab. 
For all the methods using labeled samples in the target domain (including KEMA), we 
used 3 labeled samples in target domain to define the projections. 

We used sensible kernels for this problem in KEMA: the (fast) histogram inter¬ 
section kernel, Ki{xj,Xk) = j 2 :^}, and the X 2 kernel, K^^{xj^Xk) = 

exp(—x^/(2cr^)), with ~ I P3]. We used u = 300 un¬ 

labeled samples to compute the graph Laplacians, for which a /c-NN graph with fc = 21 
was used. 

The numerical results obtained in all the eight problems are reported in Table 1: 
KEMA outperforms all the unsupervised competing methods and, in most of the cases, 
improves the results obtained by the semi-supervised methods using labels in the source 
domain only. KEMA provides the most accurate results in 3 out of the 8 settings when 
confronted to state-of-the-art (semi)-supervised algorithms, and similar peformance to 
state-of-the-art GEK in 6 out of the 8 settings. KEMA is as accurate as the state of the 
art, but with the advantage of handling naturally domains of different dimensionality, 
and not requiring a discriminative classifier to align the domains such as for MMDT. 


Table 1: Accuracy in the visual object recognition study (C: Caltech, A: Amazon, D: 
DSLR, W: Webcam). 1-NN classification testing on all samples from the target domain 
(^domain: number of labels per class, * = results reported in [11], ^ = results reported 
in [12]). 
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C -)■ A 

21.4± 3.7 

36.8 ± 0.5 

36.9 ± 0.4 

40.4 ± 0.7 

43.5 ± 2.1 

44.7 ± 0.8 

49.4 ± 0.8 

47.1 ± 3.0 

47.9 ± 3.2 

35.4 ± 2.4 

C -)■ D 

12.3 ± 2.8 

32.6 ± 0.7 

35.2 ± 1.0 

41.1 ± 1.3 

41.8 ± 2.8 

57.7 ± 1.1 

56.5 ± 0.9 

61.5 ± 2.8 

63.4 ± 3.4 

65.1 ± 1.9 

A-)- C 

19.9 ± 1.9 

35.3 ± 0.5 

35.6 ± 0.4 

37.9 ± 0.4 

35.2 ± 0.8 

36.0 ± 0.5 

36.4 ± 0.8 

29.5 ± 3.0 

30.4 ± 3.3 

28.4 ± 1.6 

A —>• W 

17.5 ± 3.7 

31.0 ± 0.7 

34.4 ± 0.9 

35.7 ±0.9 

38.4 ± 5.4 

58.6 ± 1.0 

64.6 ± 1.2 

65.4 ± 2.7 

66.5 ± 2.9 

63.5 ± 2.6 

W —>• C 

24.2 ± 1.4 

21.7 ± 0.4 

27.2 ± 0.5 

29.3 ± 0.4 

35.5 ± 0.9 

31.1 ± 0.6 

32.2 ± 0.8 

32.9 ± 3.3 

32.4 ± 3.0 

28.4 ± 1.6 

W —>• A 

27.0 ± 1.5 

27.5 ± 0.5 

31.1 ± 0.7 

35.5 ± 0.7 

40.0 ± 1.0 

44.1 ± 0.4 

47.7 ± 0.9 

44.9 ± 4.5 

45.9 ± 3.9 

35.4 ± 2.4 

D —>• A 

19.0 ± 2.2 

32.0 ± 0.4 

32.5 ± 0.5 

36.1 ± 0.4 

34.9 ± 1.3 

45.7 ± 0.6 

46.9 ± 1.0 

44.2 ± 3.1 

45.2 ± 3.4 

35.4 ± 2.4 

D —>• W 

37.4 ± 3.0 

66.0 ± 0.5 

74.9 ± 0.6 

79.1 ± 0.7 

84.2 ± 1.0 

76.5 ± 0.5 

74.1 ± 0.8 

64.1 ± 2.9 

66.7 ±3.1 

63.5 ± 2.6 

Mean 

22.34 

35.36 

38.48 

41.89 

44.19 

49.30 

50.98 

48.70 

49.80 

44.39 


4.3 Recognition of facial expressions in multi-subject databases 

This experiment deals with the task of recognizing facial expressions. We used the 
dataset in [24], where 185 photos of three subjects depicting three facial expressions 
(happy, neutral and shocked) are available. Each image is 217 x 308 pixels and we take 
each pixel as one dimension for classification: the problem is 200‘508 dimensional. 
Each pair {subject,expression} has around 20 repetitions. 

Different subjects represent the domains and we align them with respect to the 
three expression classes. We used only three labeled examples per class and subject. 
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Results are given in Fig. 4(a): since it works directly in the dual, KEMA can effec¬ 
tively cast the three-domains problem into a single ten-dimensional latent space, where 
all domains are classified with less than 5% error. This shows an additional advan¬ 
tage of KEMA with respect to SSMA in high dimensional spaces: SSMA would have 
required to solve a 601‘524-dimensional eigenproblem, while KEMA solves only a 
55-dimensional problem. Eigures 4(b)-(d) present different visualizations of the two 
hrst dimensions of the latent space: subject #1 seems to be the most difficult to align 
with the two others, difficulty that is also reflected in the higher classification errors. 
Actually, subject #1 shows little variations in his facial traits from one expression to 
the other compared to the other subjects (see Eig. 3 in [24]). 


(a) Error rates 


(b) Subjects 


(c) Labels 


(d) Predictions 



Nf 


• Subject 1 

• Subject 2 

• Subject 3 




• v-.f 

'■■A 


• happy 

• neutral 



• happy 

• neutral 



Eigure 4: Results of the classihcation of facial expressions. 


5 Conclusions 

We introduced a kernel method for semi-supervised manifold alignment. We want to 
stress that this particular kernelization goes beyond the standard academic exercise as 
the method addresses many problems in the literature of domain adaptation and mani¬ 
fold learning. The so-called KEMA can actually align an arbitrary number of domains 
of different dimensionality without needing corresponding pairs, just few labeled ex¬ 
amples in all domains. We also showed that KEMA generalizes SSMA when using a 
linear kernel, which allows us to deal with high-dimensional data efficiently in the dual 
form. Working in the dual can be computationally costly because of the construction 
of the graph Laplacians and the size of the involved kernel matrices. Regarding the 
Laplacians, they can be computed just once and off-line, while regarding the size of 
the kernels, we introduced a reduced-ranked version that allows to work with a fraction 
of the samples while maintaining the accuracy of the representation. Advantageously, 
KEMA can align manifolds of very different structures and dimensionality, performing 
a sort of manifold unfolding along with the alignment. Importantly, the inversion of the 
KEMA projections has a closed-form solution without the need of pre-imaging. This is 
an important feature that allows synthesis applications, but more remarkably allows to 
study and characterize the distortion of the manifolds in physically meaningful units. 
To authors’ knowledge this is the hrst method in addressing all these important issues 
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at once. All these features were illustrated through toy examples and real problems in 

computer vision and machine learning. 
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